Method for cloning and producing the bssHII restriction endonuclease in E. coli

ABSTRACT

The present invention relates to cloning recombinant DNA molecules encoding a multi-specific methylase gene (bssHIIM1), BssHII restriction endonuclease gene (bssHIIR), and the cognate BssHII methylase gene (bssHIIM2) from Bacillus stearothermophilus H3 E. coli. The BssHII multi-specific methylase gene was first cloned in a Sau3AI library using a modified pLITMUS28 vector (New England Biolabs, Inc., Beverly, Mass.) with two BssHII sites. Expression of the multi-specific BssHII methylase renders the two BssHII sites resistant to BssHII digestion. Surprisingly, the cloned methylase also modifies some other sites in addition to BssHII site (5&#39;GCGCGC3&#39;). The methylase also modifies BsrFI site (5&#39;RCCGGY3&#39;) and HaeII site (5&#39;RGCGCY3&#39;); and partially modifies EagI site (5&#39;CGGCCG3&#39;) and MIul site (5&#39;ACGCGT3&#39;). The beginning of the bssHIIR gene was cloned by using two degenerate primers based on the N-terminal amino acid sequence in PCR. The rest of the bssHIIR gene was cloned by inverse PCR. The cognate bssHIIM2 gene was cloned by inverse PCR and PCR. The BssHII restriction endonuclease gene was expressed in E. coli host ER417 carrying three plasmids pLysP, pLG-BssHIIM2, pET21AT-BssHIIR.

BACKGROUND OF THE INVENTION

The present invention relates to recombinant DNA which encodes the BssHII restriction endonuclease as well as BssHII methylase, and the production of BssHII restriction endonuclease from the recombinant DNA.

Type II restriction endonucleases are a class of enzymes that occur naturally in bacteria. When they are purified away from other bacterial components, restriction endonucleases can be used in the laboratory to cleave DNA molecules into precise fragments for molecular cloning and gene characterization.

Restriction endonucleases act by recognizing and binding to particular sequences of nucleotides (the `recognition sequence`) along the DNA molecule. Once bound, they cleave the molecule within, or to one side of, the recognition sequence. Different restriction endonucleases have affinity for different recognition sequences. Over two hundred and eleven restriction endonucleases with unique specificities have been identified among the many hundreds of bacterial species that have been examined to date (Roberts and Macelis, Nucl. Acids Res. 24:223-235, (1996)).

Bacteria tend to possess at most, only a small number of restriction endonucleases per species. The endonucleases typically are named according to the bacteria from which they are derived. Thus, the species Deinococcus radiophilus for example, produces three different restriction endonucleases, named DraI, DraII and DraIII. These enzymes recognize and cleave the sequences 5'TTTAAA3', 5'PuGGNCCPy3' and 5'CACNNNGTG3' respectively. Escherichia coli RY13, on the other hand, produces only one enzyme, EcoRI, which recognizes the sequence 5' GAATTC3'.

It is thought that in nature, restriction endonucleases play a protective role in the welfare of the bacterial cell. They enable bacteria to resist infection by foreign DNA molecules like viruses and plasmids that would otherwise destroy or parasitize them. They impart resistance by cleaving invading foreign DNA molecule each time that the recognition sequence occurs. The cleavage that takes place disables many of the infecting genes and renders the DNA susceptible to further degradation by non-specific nucleases.

A second component of bacterial protective systems are the modification methylases. These enzymes are complementary to restriction endonucleases and they provide the means by which bacteria are able to protect their own DNA and distinguish it from foreign, infecting DNA. Modification methylases recognize and bind to the same recognition sequence as the corresponding restriction endonuclease, but instead of cleaving the DNA, they chemically modify one particular nucleotide within the sequence by the addition of a methyl group. Following methylation, the recognition sequence is no longer cleaved by the restriction endonuclease. The DNA of a bacterial cell is always fully modified by virtue of the activity of its modification methylase. It is therefore completely insensitive to the presence of the endogenous restriction endonuclease. It is only unmodified, and therefore identifiably foreign DNA, that is sensitive to restriction endonuclease recognition and cleavage.

With the advent of genetic engineering technology, it is now possible to clone genes and to produce the proteins and enzymes that they encode in greater quantities than are obtainable by conventional purification techniques. The key to isolating clones of restriction endonuclease genes is to develop a simple and reliable method to identify such clones within complex `libraries`, i.e. populations of clones derived by `shotgun` procedures, when they occur at frequencies as low as 10⁻³ to 10⁻⁴. Preferably, the method should be selective, such that the unwanted majority of clones are destroyed while the desirable rare clones survive.

Type II restriction-modification systems are being cloned with increasing frequency. The first cloned systems used bacteriophage infection as a means of identifying or selecting restriction endonuclease clones (EcoRII: Kosykh et al., Molec. Gen. Genet. 178: 717-719, (1980); HhaII: Mann et al., Gene 3:97-112, (1978); PstI: Walder et al., Proc. Nat. Acad. Sci. 78:1503-1507, (1981)). Since the presence of restriction-modification systems in bacteria enable them to resist infection by bacteriophages, cells that carry cloned restriction-modification genes can, in principle, be selectively isolated as survivors from libraries that have been exposed to phage. This method has been found, however, to have only limited value. Specifically, it has been found that cloned restriction-modification genes do not always manifest sufficient phage resistance to confer selective survival.

Another cloning approach involves transferring systems initially characterized as plasmid-borne into E. coil cloning plasmids (EcoRV: Bougueleret et al., Nucl. Acid. Res. 12:3659-3676, (1984); PaeR7: Gingeras and Brooks, Proc. Natl. Acad. Sci. USA 80:402-406, (1983); Theriault and Roy, Gene 19:355-359 (1982); PvuII: Blumenthal et al., J. Bacteriol. 164:501-509, (1985)).

A third approach, and one that is being used to clone a growing number of systems, is for an active methylase gene (Wilson, U.S. Pat. No. 5,200,333 issued Apr. 6, 1993 and BsuRI: Kiss et al., Nucl. Acid. Res. 13:6403-6421, (1985)).

Since restriction and modification genes are often closely linked, both genes can often be cloned simultaneously. However, this selection does not always yield a complete restriction system, but instead yields only the methylase gene (BspRI: Szomolanyi et al., Gene 10:219-225, (1980); BcnI: Janulaitis et al., Gene 20:197-204 (1982); BsuRI: Kiss and Baldauf, Gene 21:111-119, (1983); and MspI: Walder et al., J. Biol. Chem. 258:1235-1241, (1983)).

A more recent method, the "endo-blue method", has been described for direct cloning of restriction endonuclease genes in E. coli based on the indicator strain of E. coli containing the dinD::lacZ fusion (Fomenkov et al., U.S. Pat. No. 5,498,535, issued Mar. 12, 1996; Fomenkov et al., NucL. Acids Res. 22:2399-2403, (1994)). This method utilizes the E. coli SOS response following DNA damage caused by restriction endonucleases or non-specific nucleases. A number of thermostable nuclease genes (Tth111I, BsoBI, Tf nuclease) have been cloned by this method.

Another obstacle to cloning these systems in E. coli was discovered in the process of cloning diverse methylases. Many E. coli strains (including those normally used in cloning) have systems that resist the introduction of DNA containing cytosine methylation. (Raleigh and Wilson, Proc. Natl. Acad. Sci., USA 83:9070-9074, (1986)). Therefore, it is also necessary to carefully consider which E. coli strain(s) to use for cloning.

Because purified restriction endonucleases and modification methylases, are useful tools for creating recombinant molecules in the laboratory, there is a commercial incentive to obtain bacterial strains through recombinant DNA techniques that produce these enzymes in large quantities. Such overexpression strains would also simplify the task of enzyme purification.

SUMMARY OF THE INVENTION

The present invention relates to recombinant DNA molecules encoding a multi-specific methylase gene (bssHIIM1), BssHII restriction endonuclease gene (bssHIIR), and the cognate BssHII methylase gene (bssHIIM2) from Bacillus stearothermophilus H3 E. coli. The BssHII multi-specific methylase gene was first cloned in a Sau3AI library using a modified pLITMUS28 vector (New England Biolabs, Inc., Beverly, Mass.) with two BssHII sites. Expression of the multi-specific BssHII methylase renders the two BssHII sites resistant to BssHII digestion. Surprisingly, the cloned methylase also modifies some other sites in addition to BssHII site (5'GCGCGC3'). Specifically, this methylase also modifies BsrFI (5'RCCGGY3') and HaeII sites (5'RGCGCY3'); and partially modifies EagI (5'CGGCCG3') and MluI sites (5'ACGCGT3'). No mono-specific BssHII methylase was recovered from the partial Sau3AI libraries prepared from B. stearothermophilus H3 genomic DNA.

To facilitate the cloning of BssHII restriction endonuclease gene, large amounts of BssHII restriction endonuclease protein was purified from B. stearothermophilus H3 cells. The N-terminal amino acid sequence was determined as follows: (Met) Gly Glu Asn Gln Glu Ser Ile Trp Ala Asn Gln Ile Leu Asp Lys Ala Gln Leu Val Ser? Pro Glu Thr His Xaa Gln Asn? Xaa Ala Asp (SEQ ID NO:1). (?=ambiguous calling).

DNA fragments adjacent to the multi-specific BssHII methylase gene (bssHIIM1) were sequenced in the hope of finding BssHII endonuclease gene. Over 5000 bp of DNA surrounding the multi-specific methylase gene was sequenced. Translation of the open reading frames into amino acid sequences did not match the N-terminal amino acid sequence of the BssHII endonuclease protein. It was concluded that the bona fide BssHII restriction endonuclease gene was likely located somewhere else in the B. stearothermophilus H3 chromosome.

In order to determine the location of the BssHII restriction endonuclease genes, two sets of degenerate primers were designed based on the protein sequence and used in PCR to amplify the first 59 bp of the BssHII restriction endonuclease gene from B. stearothermophilus H3 genomic DNA which was then cloned into pUC19 (ATCC 37254) and the insert was sequenced. A set of inverse PCR primers was designed from the known 59 bp sequence. An inverse PCR product was found in the Sau3AI digested and self-ligated B. stearothermophilus H3 genomic DNA. The inverse PCR DNA was cloned and sequenced. This provided an additional 465 bp of new DNA coding sequence. Another set of inverse PCR was designed from the end of the 465 bp sequence and used to amplify the remaining part of the BssHII endonuclease gene from HinfI or RsaI cleaved and self-ligated B. stearothermophilus H3 genomic DNA. The inverse PCR products were cloned and sequenced. After another 730 bp of new sequence, a stop codon was found in the RsaI fragment. The entire BssHII endonuclease gene (bssHIIR) is 1254 bp (59 bp+465 bp+730 bp), encoding the 417-aa BssHII restriction endonuclease with predicted molecular mass of 47 kDa.

To premodify E. coli chromosome, the multi-specific BssHII methylase gene (bssHIIM1) was cloned in a compatible vector pACYC184 (ATCC 37033). The BssHII endonuclease gene was amplified by PCR. An efficient ribosome binding site and an optimal spacing was engineered in front of bssHIIR gene. The PCR product were cloned into pUC19. Three isolates carrying the PCR inserts were found. However, no activity was detected in the IPTG-induced cell cultures. It was concluded that the three inserts may have carried mutations in the bssHIIR gene.

Vector pUC19 is a high-copy-number plasmid containing the lac promotor. Expression of genes from the lac promotor is not tightly regulated. Therefore, it was reasoned that a tightly-regulated promotor such as the T7 promotor might be desirable for the expression of bssHIIR gene. The bssHIIR gene was amplified by PCR with primers flanked by XbaI and BamHI sites. An efficient ribosome binding site and an optimal spacing were engineered in front of the gene. The PCR product was ligated into vector pET21AT (New England Biolabs Inc., Beverly, Mass.), a T7 expression vector with transcription terminators upstream of the T7 promoter. The bssHIIM1 gene was first inserted into the vector pLG339. The endonuclease-carrying plasmid was then transformed into E. coli cell ER2504 pLysS, pLG-BssHIIM1!. E. coli cells carrying pLysS, pLG-BssHllM1 and pET21AT-BssHIIR were induced with IPTG and cell extracts were assayed for BssHII endonuclease activity. Cell extracts from twelve isolates were assayed and all twelve clones produced BssHII endonuclease activity. One example is shown in FIG. 4.

Since type II restriction endonuclease gene and the cognate methylase gene are located in close proximity to each other, a set of inverse PCR primers were synthesized based on the end of restriction endonuclease sequence. Downstream sequences were amplified by inverse PCR and cloned. The bona fide BssHII methylase gene (bssHIIM2) was cloned in two steps of inverse PCR. The entire bssHIIM2 gene was amplified by PCR and cloned into a compatible vector pLG339 (ATCC 37131) derived from pSC101 to premodify E. coli chromosome. The premodified host ER2417 pLysP, pLG-BssHllM2! was transformed with pET21AT-BssHIIR. The resulting strain ER2417 pLysP, pLG-BssHllM2, pET21AT-BssHIIR! produced 1×10⁵ units of BssHII restriction endonuclease after IPTG induction.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is the DNA sequence of the multi-specific BssHII gene (bssHIIM1) and its encoded amino acid sequence (SEQ ID NO:2).

FIG. 2 is the DNA sequence of the BssHII restriction endonuclease gene (bssHIIR) and its encoded amino acid sequence (SEQ ID NO:3).

FIG. 3 is the BssHII endonuclease activity assay using cell extract and λ DNA.

FIG. 4 is the DNA sequence of the bona fide BssHII methylase gene (bssHIIM2) and its encoded amino acid sequence (SEQ ID NO:4).

FIG. 5 is the gene organization of the BssHII restriction modification system.

DETAILED DESCRIPTION OF THE INVENTION

The method described herein by which the multi-specific BssHII methylase gene and the BssHII restriction endonuclease and the cognate methylase gene are preferably cloned and expressed using the following steps:

1. The genomic DNA of B. stearothermophilus H3 is purified.

2. The DNA is digested partially with a restriction endonuclease such as Sau3AI, or any of its isoschizomers, that generates a DNA fragment(s) containing the entire BssHII multi-specific methylase gene. Alternatively, one can skip to step 9 to clone the BssHII endonuclease gene directly by PCR and inverse PCR using degenerate primers that are designed from the N-terminus amino acid sequence.

3. The Sau3AI-digested genomic DNA is then ligated into BamHI-cleaved/ClP treated pLlTMUS cloning vector with two BssHII sites. The ligated DNA is used to transform an appropriate host, i.e. a HsdR⁻, McrBC⁻, Mrr⁻ strain, such as E. coli strain RR1. The DNA/cell mixtures are then plated on ampicillin media selecting for transformed cells. After incubation, the transformed colonies are harvested and amplified to form the primary cell library.

4. The recombinant plasmids are next purified in toto from the primary cell library to make the primary plasmid library. The purified plasmid library is then digested to completion in vitro with BssHII restriction endonuclease, or any BssHII isoschizomer. The BssHII endonuclease digestion causes the selective destruction of unmodified, non-methylase-containing clones, resulting in an increase in the relative frequency of BssHII methylase-carrying clones.

5. Identification of the multi-specific BssHII methylase clone: The digested plasmid library DNA is transformed back into a host such as E. coli strain RR1 and transformed colonies are again obtained by plating on ampicillin plates. The colonies are picked and their plasmids are prepared and analyzed for the presence of the BssHII methylase gene by incubating purified plasmid DNA in vitro

with BssHII endonuclease to determine whether it is resistant to BssHII digestion.

6. Once it has been established that the methylase gene has been cloned, the clone is analyzed by restriction mapping and deletion mapping. The entire insert is then sequenced. Following this approach, one open reading frame corresponding to the BssHII multi-specific methylase gene is found. The cloned methylase also modifies some other sites in addition to BssHII site (5'GCGCGC3'). The plasmid DNA carrying the multi-specific BssHII methylase gene is also resistant to BsrFI (5'RCCGGY3') and HaeII (5'RGCGCY3') digestion; and is partially resistant to EagI (5° CGGCCG3') and MluI (5'ACGCGT3') digestion. It is concluded that the cloned methylase is a multi-specific methylase.

7. BssHII restriction endonuclease protein is purified in large quantities and the N-terminus amino acid sequence determined. The N-terminus sequence is as following (Met) Gly Glu Asn Gln Glu Ser Ile Trp Ala Asn Gln Ile Leu Asp Lys Ala Gln Leu Val Ser? Pro Glu Thr His Xaa Gln Asn? Xaa Ala Asp (SEQ ID NO:1). (?=ambiguous calling).

8. Based on the above approach, a 5.8 kb DNA fragment surrounding the multi-specific BssHII methylase gene is sequenced using methylase containing clones. The DNA sequence is translated in all six frames and compared to the amino acid sequence of BssHII restriction endonuclease. However, no apparent identity/homology is found between the translated amino acid sequences and the N-terminus amino acid sequence of the purified BssHII restriction endonuclease. It is concluded that the true BssHII restriction endonuclease gene is not located next to the multi-specific methylase.

9. Efforts are made to clone the BssHII endonuclease directly by PCR and inverse PCR based on the N-terminus amino acid sequence. Two sets of degenerate primers are designed from the first five amino acid residues (MGENQE) (SEQ ID NO:5) and residues 15 to 20 (DKAQLV) (SEQ ID NO:6). The primers are used to amplify the beginning 59 bp of the BssHII restriction endonuclease gene from the B. stearothermophilus H3 genomic DNA by PCR. Multiple PCR products are found and DNA fragments in the range of 59-67 bp are gel-purified and cloned into pUC19 and sequenced. One of the two clones contained translated amino acid sequence that is identical to the expected N-terminus amino acid sequence.

10. Inverse PCR primers are designed from the beginning 59 bp DNA and used to amplify surrounding DNA sequences from B. stearothermophilus H3 genomic DNA by inverse PCR. B. stearothermophilus H3 genomic DNA is digested with the following restriction enzymes with 6 bp recognition sequences: AatII, BamHI, BglII, BspEI, ClaI, EcoRI, HindIII, MluI, NheI, or SphI and restriction enzymes with 4-5 bp recognition sequences: Sau3AI, Nlall, MspI, MboI, HinP1I, HhaI, HpaII, HaeII, EaeI, or AciI. The digestions give rise to reasonable size template DNA (less than 10 kb) for inverse PCR reaction. The digested DNA samples are self-ligated at a low DNA concentration (less than 2 microgram per ml). The ligated circular DNA is used as templates for inverse PCR reactions. Single PCR products are found in Sau3AI, MboI, and AciI digested and self-ligated genomic DNA. The DNA fragments are cloned into pUC19 and sequenced, which gives rise to 465 bp of new sequence.

11. Inverse PCR primers are designed from the end of 465 bp new sequence and used to amplify additional surrounding DNA. B. stearothermophilus H3 genomic DNA is digested with ApoI, BsaWI, Sau3AI, NlaIII, MspI, MboI, HinP1I, HhaI, HpaII, HaeII, EaeI, TaiI, HinfI, RsaI, or AciI. The digested DNA samples are self-ligated at a low DNA concentration (less than 2 microgram per ml). The ligated circular DNA is used as templates for inverse PCR reaction. Single inverse PCR products are found in HinfI or RsaI digested and self-ligated genomic DNA. The DNA fragments are cloned and sequenced. After 730 bp of additional sequence, a stop codon is found in the sequenced RsaI fragment. The entire BssHII endonuclease gene is 1254 bp (59 bp+465 bp+730 bp), encoding the 417-aa BssHII endonuclease with predicted molecular mass of 47 kDa.

12. The multi-specific BssHII methylase gene is cloned into pACYC184 (ATCC 37033) to premodify E. coli host. The entire BssHII endonuclease gene is amplified by PCR with two primers. An efficient ribosome binding site and 7 bp spacing are engineered before the ATG start codon. The endonuclease gene is cloned into pUC19. Three clones with inserts are found, but no BssHII activity is found in cell extract of IPTG-induced cell cultures.

13. In a second attempt to express the bssHIIR gene, a T7 expression vector pET21AT is used for tightly-regulated expression (F. W. Studier and B. A. Moffaft, J. MoL Biol. 189:113-130, (1986)). Vector pET21AT carries four copies of transcription terminators upstream of the T7 promoter. The co-transformation of pLysS (F. W. Studier, J. Mol. Biol. 219:37-44, (1991)) further reduces the basal level expression under non-induced condition. The bssHIIM1 gene is amplified by PCR and cloned into a compatible vector pLG339 (ATCC 37131) derived from pSC101. The bssHIIR gene is amplified by PCR and cloned into pET21AT. E. coli host cell for T7 expression, ER2504, is transformed with pLysS, pLG-BssHllM1, pET21AT-BssHIIR. Twelve isolates are induced with IPTG and cell extracts are prepared and assayed for BssHII restriction endonuclease activity. All clones produce BssHII endonuclease.

14. To clone the methylase gene adjacent to the bssHIIR gene, a set of primers are made to amplify the downstream DNA sequences. Inverse PCR products are found in the inverse PCR reactions of HhaI, HaeIII, and Sau3AI digested and self-ligated genomic DNA. The inverse PCR products are cloned and sequenced. An unfinished open reading frame is found in the newly derived 895 bp sequence and compared to the known genes in Genbank. This unfinished open reading frame has amino acid sequence similarity to the known C⁵ methylases.

15. A second set of inverse PCR primers are synthesized and used to amplify the downstream sequences. Inverse PCR products are found in the inverse PCR reactions of Styl digested and self-ligated DNA. The DNA product is cloned into pUC19 and the insert is sequenced. The entire BssHII methylase gene (bssHIIM2) is found to be 1128 bp, encoding the BssHII methylase protein with molecular mass of 42.2 kDa.

16. The entire bssHIIM2 gene is amplified and inserted into pLG339 vector. The pre-modified E. coli host cell ER2417 pLysP, pLG339-BssHllM2! is transformed with pET21AT-BssHIIR. The resulting strain ER2417 pLysP, pLG339-BssHllM2, pET21AT-BssHIIR! (NEB#1070) produces 1×10⁵ units of BssHII restriction endonuclease per gram of wet E. coli cells.

A sample of NEB#1070 has been deposited under the terms and conditions of the Budapest Treaty with the American Type Culture CoIlection on Feb. 24, 1997 and received ATCC Accession Number 98334.

The present invention is further illustrated by the following Example. The Example is provided to aid in the understanding of the invention and is not construed as a limitation thereof.

The references cited above and below are hereby incorporated by reference.

EXAMPLE I CLONING OF BSSHII RESTRICTION ENDONUCLEASE GENE

1. Cloning of the multi-specific BssHII methylase gene (bssHIIM1) by using the methylase selection method.

Ten μg of B. stearothermophilus H3 genomic DNA was cleaved partially by 4, 2, 1, 0.5, or 0.25 units of Sau3AI at 37° C. for 30 min. The partially digested DNA was analysed by gel electrophoresis. It was found that 1 unit and 0.5 unit of Sau3AL digestion gave rise to limited partial digestion. The Sau3AL partially digested B. stearothermophilus H3 genomic DNA was ligated into BamHI cleaved/ClP treated pLITMUS28 (a pUC19 derivative with two BssHII sites, New England Biolabs Inc., Beverly, Mass.) at 16° C. overnight. Ligated DNA was transformed into RR1 competent cells and plated on ampicillin plates. A total of about 1×10⁵ cells were derived from the transformation experiment. These cells were pooled together and inoculated into 1 liter LB broth plus Ap and cultured overnight at 37° C. Plasmid DNA was prepared from the primary cell library. 10, 5, 2, and 1 μg of plasmid DNA were cleaved with 40 units of BssHII restriction endonuclease for four hours at 50° C. The BssHII-digested DNA was retransformed into RR1 competent cells. Plasmids were isolated again from cultures of the surviving transformants and digested with BssHII restriction enzyme to see if the plasmid DNA was resistant to BssHll digestion. Thirty-six plasmids were checked for resistance to BssHII digestion. Three resistant clones (#3, #18, and #30) were found. They carry about 6 kb genomic DNA insert. When these plasmids were digested with BsrFI, SacII, EagI, MluI, HaeII, NarI, HhaI, SacI, or EcoO109I, it was found that the plasmid DNA was also resistant or partially resistant to BsrFI, HaeII, EagI, or MluI digestions. Cell extracts were prepared from the three isolates and assayed for BssHII endonuclease activity. No apparent BssHII endonuclease activity was detected in the three cell extracts. The entire multi-specific methylase gene and the surrounding DNA were sequenced. The bssHIIM1gene sequence is shown in FIG. 1. No apparent restriction endonuclease gene was found. It was concluded that this methylase gene may be a phage encoded multi-specific methylase gene. The same gene was cloned and its sequence has been published (Schumann, J., et al, Gene 157:103-104 (1995)).

2. Cloning of bssHIIR gene.

Since the methylase selection method yielded only a multi-specific BssHII methylase, efforts were concentrated on directly cloning the BssHII endonuclease gene. BssHII restriction endonuclease was purified from B. stearothermophilus H3 cells by chromatography and subjected to N-terminal amino acid sequencing. The N-terminal amino acid sequence is as follows:

(Met) Gly Glu Asn Gln Glu Ser Ile Trp Ala Asn Gln Ile Leu Asp Lys Ala Gln Leu Val Ser? Pro Glu Thr His Xaa Gln Asn? Xaa Ala Asp (SEQ ID NO:1) (?=ambiguous calling) (the residues in bold were used to design degenerate primers).

A forward degenerate primer was designed based on the first six amino acid sequence (Met) Gly Glu Asn Gln Glu (SEQ ID NO:5).

The forward primer sequence is as follows: ##STR1## Two reverse degenerate primers were designed based on the amino acid sequence Asp Lys Ala Gln Leu Val (SEQ ID NO:6). Because the codon coding for Leu has six possibilities (CUG, CUC, CUU, CUA, UUG, and UUA), two degenerate primers were made, one with CTN encoding Leu, the other with TTR encoding Leu. The two reverse degenerate primer sequences are: ##STR2## Two PCR reactions were set up: the forward primer and reverse primer 1 were used in PCR reaction 1, and the forward primer and reverse primer 2 were used in PCR reaction 2. A 59 bp PCR product would be expected if the DNA amplification showed positive results. The 59 bp of the BssHII endonuclease gene encode the first 20 amino acids. After 30 rounds of PCR amplification (95° C. 1 min, 40° C. 1 min, 72° C. 30 sec, 30 cycles), an expected PCR product was detected on an agarose gel. The small DNA product was gel-purified through a 3% low melting agarose gel. The gel slice was treated with β-agarase and the DNA was precipitated and cloned into pUC19. Two clones with insert were sequenced. The sequence of the first 59 bp of bssHIIR gene are as follows: ##STR3## The encoded amino acid sequence is: Met Gly Glu Asn Gln Glu Ser Ile Trp Ala Asn Gln Ile Leu Asp Lys Ala Gln Leu Val (SEQ ID NO:11), which matches perfectly with the N-terminal amino acid sequence derived by amino acid sequencing of the native protein.

To clone the rest of the endonuclease gene, a set of inverse PCR primers was designed from the 59 bp DNA sequence. The inverse PCR primers are as follows: ##STR4##

B. stearothermophilus H3 genomic DNA was digested with the following restriction enzymes with 6 bp recognition sequences AatII, BamHI, BgiII, BspEI, ClaI, EcoRI, HindIII, MluI, NheI, or SphI, and restriction enzymes with 4-5 bp recognition sequences, Sau3AI, NlaIII, MspI, MboI, HinP1I, HhaI, HpaIll HaeII, EaeI, or AciI for two hours at the required temperatures. The digestions gave rise to reasonable size template DNA (less than 10 kb) for inverse PCR reaction. The digested DNA samples were self-ligated at a low DNA concentration (less than 2 microgram per ml). The ligated circular DNA was extracted twice with phenol-CHCI₃ and once with CHCI₃. The DNA was precipitated with ethanol, resuspended in TE beffer, and used as templates for inverse PCR reactions (95° C. 1 min, 60° C. 1 min, 72° C. 2 min, 30 cycles). PCR products were found in Sau3AI, MboI, and AciI digested and self-ligated genomic DNA. The DNA fragments were cloned into pUC19 and sequenced, which gave rise to 465 bp of new sequence.

To clone the rest of the endonuclease gene, a second set of inverse PCR primers were designed from the end of 465 bp new sequence. The primer sequences are as follows: ##STR5## Two primers were used to amplify additional surrounding DNA. B. stearothermophilus H3 genomic DNA was digested with Apol, BsaWI, Sau3AI, Nlall, MspI, MboI, HinP1I, HhaI, HpaII, HaeII, EaeI, TaiI, HinfI, RsaI, or AciI. The digested DNA samples were self-ligated at a low DNA concentration (less than 2 microgram per ml). The ligated circular DNA was extracted twice with phenol-CHCI₃ and once with CHCI₃. The DNA was precipitated with ethanol and resuspended in TE buffer. The ligated circular DNA was used as templates for inverse PCR reaction (95° C. 1 min, 60° C. 1 min, 72° C. 2 min, 30 cycles). Inverse PCR products were found in HinfI or RsaI digested and self-ligated genomic DNA. The DNA fragments were cloned and sequenced. After 730 bp of additional sequence, a stop codon was found in the sequenced RsaI fragment. The entire BssHII endonuclease gene shown in FIG. 2 is 1254 bp (59 bp+465 bp+730 bp), encoding the 417-aa BssHII endonuclease with predicted molecular mass of 47 kDa, which matches closely with the apparent size (46.5 kDa) on an SDS-PAGE gel.

3. Expression of bssHIIR gene in pUC19

To premodify E. coli host, the multi-specific BssHII methylase gene was cloned into pACYC184. The entire BssHII endonuclease gene was amplified by PCR with two primers. An efficient ribosome binding site and 7 bp spacing were engineered before the ATG start codon. The endonuclease gene was cloned into pUC19. Three clones with inserts were found, but no apparent BssHII activity was found in cell extract of IPTG-induced cell cultures. It was concluded that the inserts may carry mutations during PCR amplifications. Expression of BssHII endonuclease gene from pUC19 may also be unstable due to leaky expression.

4. Expression of bssHIIR gene in the T7 expression vector pET21AT.

In a second attempt to express the bssHIIR gene, a T7 expression vector pET21AT was used for tightly-regulated expression. Vector pET21AT carries four copies of transcription terminators upstream of the T7 promoter. The co-transformation of pLysS further reduces the basal level expression under non-induced condition. The bssHIIM1 gene was amplified by PCR and cloned into a compatible vector pLG339 (derived from pSC101). Two primers were designed for bssHIIR gene amplification are as follows: ##STR6##

The bssHIIR gene was amplified by PCR using Vent® polymerase (95° C. 1 min, 60° C. 1 min, 72° C. 1 min, 20 cycles) and cloned into pET21AT. E. coli host cell for T7 expression, ER2504, was transformed with pLysS, pLG-BssHllM1, pET21AT-BssHIIR. Twelve isolates were cultured to late log growth phase (about 120 klett units). IPTG was added to a final concentration of 0.5 mM. IPTG induction was carried out for three hours and cell extracts were prepared and assayed for BssHII restriction endonuclease activity. All 12 clones produced BssHII endonuclease. One clone was induced in 500 ml cell culture and its cell extract was diluted 10-, 100-, 1000-fold and used to assay BssHII endonuclease activity on lambda DNA. FIG. 5 shows the assay result. The cell extract displayed BssHII activity after 10- and 100-fold dilutions (lanes 2 and 3).

5. Cloning of bssHIIM2 gene.

In type II restriction-modification systems, the restriction endonuclease gene and its cognate methylase gene are usually arranged in close proximity to each other. The DNA upstream of the bssHIIR gene was amplified by inverse PCR, cloned, and sequenced. The upstream sequence has homolgy to known ribosomal rRNA sequence. Therefore, it was reasoned that the true BssHII methylase gene (bssHIIM2) must be located downstream of the restriction endonuclease gene. To clone the methylase gene downstream of the bssHIIR gene, a set of primers were made as follows: ##STR7##

B. stearothermophilus H3 genomic DNA was digested with AciI, Alul, Apol,BsaWl, EaeI, HhaI, HaeII, HaeIII, MseI, Sau3AI, or Tsp509I. The digested DNA samples were self-ligated at a low DNA concentration (less than 2 microgram per ml). The ligated circular DNA was extracted twice with phenol-CHCl₃ and once with CHCl₃. The DNA was precipitated with ethanol and resuspended in TE buffer. The ligated circular DNA was used as templates for inverse PCR reaction (95° C. 1 min, 55° C. 1 min, 72° C. 2 min, 30 cycles). Inverse PCR products were found in the inverse PCR reactions of HhaI, HaeIII, and Sau3AL digested and self-ligated genomic DNA. The three inverse PCR products were cloned and sequenced. An unfinished open reading frame was found in the newly derived 895 bp sequence and it was compared to the known genes in Genbank. This unfinished open reading frame has amino acid sequence similarity to the known C⁵ methylases.

To clone the rest of the bssHIIM2 gene, a second set of inverse PCR primers were synthesized as follows: ##STR8##

These two primers were used to amplify the downstream sequences. B. stearothermophilus H3 genomic DNA was digested with AvaI, AluI, BsaJI, BstNI, Csp6I, DpnII, EcoO109I, HaeIII, Hinf1I, MboI, MseI, RsaI, Sau3AI, Sau96I, SpeI, StyI, TaiI, TfiI, Tsp45I, or Tsp509I. The digested DNA samples were self-ligated at a low DNA concentration (less than 2 microgram per ml). The ligated circular DNA was extracted twice with phenol-CHCI₃ and once with CHCI₃. The DNA was precipitated with ethanol and resuspended in TE buffer. The ligated circular DNA was used as templates for inverse PCR reaction (95° C. 1 min, 55° C. 1 min, 72° C. 2 min, 30 cycles). Inverse PCR products were found in the inverse PCR reactions of HaeIII, Hinf1I MboI, Sau3AI, and Styl digested and self-ligated genomic DNA. Inverse PCR product from Styl digested and self-ligated DNA was cloned into pUC19 and the insert was sequenced. A stop codon was found in the newly derived 250 bp DNA sequence. The entire BssHII methylase gene (bssHIIM2), shown in FIG. 4, is 1128 bp, encoding the 375-aa BssHII methylase protein with molecular mass of 42.2 kDa.

6. Expression of bssHIIR gene in bssHIIM2 premodified host cell.

The entire bssHIIM2 gene was amplified using two primers: ##STR9## The bssHIIM2 gene was inserted into plasmid vector pLG339.

The pre-modified E. coli host cell ER2417 pLysP, pLG-BssHllM2! was transformed with pET21AT-BssHIIR. The resulting strain ER2417 pLysP, pLG-BssHllM2, pET21AT-25 BssHIIR! was cultured at 30° C. in 500 ml LB plus Ap (50 μg/ml), Km (50 μg/ml), Cm (30 μg/ml) to late log phase (120-150 klett units). IPTG was added to the culture to a final concentration of 0.5 mM. Following IPTG induction for three hours, cells were harvested and lysed with lysozyme and sonications. The cell extract was assayed for BssHII activity on lambda DNA. The clone produced 1×10⁵ units of BssHII restriction endonuclease per gram of wet E. coli cells.

    __________________________________________________________________________     SEQUENCE LISTING                                                               (1) GENERAL INFORMATION:                                                       (iii) NUMBER OF SEQUENCES: 23                                                  (2) INFORMATION FOR SEQ ID NO:1:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 31 amino acids                                                     (B) TYPE: amino acid                                                           (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: peptide                                                    (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE: N-terminal                                                  (vi) ORIGINAL SOURCE:                                                          (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1:                                        MetGlyGluAsnGlnGluSerIleTrpAlaAsnGlnIleLeuAspLys                               151015                                                                         AlaGlnLeuValSerProGluThrHisXaaGlnAsnXaaAlaAsp                                  202530                                                                         (2) INFORMATION FOR SEQ ID NO:2:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 1608 base pairs                                                    (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: cDNA                                                       (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE:                                                             (vi) ORIGINAL SOURCE:                                                          (ix) FEATURE:                                                                  (A) NAME/KEY: Coding Sequence                                                  (B) LOCATION: 1...1605                                                         (D) OTHER INFORMATION:                                                         (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2:                                        ATGACGTTCGACGTAACGCCACAATTACCGGACAGCGGCCTGACCGTC48                             MetThrPheAspValThrProGlnLeuProAspSerGlyLeuThrVal                               151015                                                                         GCGGAACTATTCGCAGGGGGCGGTCTAATGGCGGTCGGCCTCCGCGCT96                             AlaGluLeuPheAlaGlyGlyGlyLeuMetAlaValGlyLeuArgAla                               202530                                                                         GCCGGTTACAATCTCGTATGGGCCAACGACTTCGATAAGTCGGCTTGC144                            AlaGlyTyrAsnLeuValTrpAlaAsnAspPheAspLysSerAlaCys                               354045                                                                         GCCGCCTACCGCCATAATCTCGGCGACCACATCGTACACGGCGACATT192                            AlaAlaTyrArgHisAsnLeuGlyAspHisIleValHisGlyAspIle                               505560                                                                         ACCGCGATTGATCCCGCTGACATTCCGGACACCGACGTGATTGCCGGT240                            ThrAlaIleAspProAlaAspIleProAspThrAspValIleAlaGly                               65707580                                                                       GGCCCGCCTTGTCAGGATTACAGCGTCGCGGGAACCGGTGCGGGCGAA288                            GlyProProCysGlnAspTyrSerValAlaGlyThrGlyAlaGlyGlu                               859095                                                                         GAGGGCGAACGCGGCAAGCTCGTGTGGGCGTACCTGCGGATTATCGAG336                            GluGlyGluArgGlyLysLeuValTrpAlaTyrLeuArgIleIleGlu                               100105110                                                                      GCGAAGCGTCCGAAAGCGTTCATATTCGAGAACGTGAAGGGTCTCATT384                            AlaLysArgProLysAlaPheIlePheGluAsnValLysGlyLeuIle                               115120125                                                                      ACGAAAAAGCACCGCCCGACATTCGATGCGTTGCTGAAACAATTTAAG432                            ThrLysLysHisArgProThrPheAspAlaLeuLeuLysGlnPheLys                               130135140                                                                      ATAATCGGGTATAACGTGTCATGGAAACTCATCAACGCATGGGACTAC480                            IleIleGlyTyrAsnValSerTrpLysLeuIleAsnAlaTrpAspTyr                               145150155160                                                                   GGAGTGGCGCAGAAGAGGGAGCGTGTGTTCATCGTGGGCATCCGCGCT528                            GlyValAlaGlnLysArgGluArgValPheIleValGlyIleArgAla                               165170175                                                                      GATCTAGGATTTGCGTTTGAGTTTCCGGAACCGCGACCGGGGGACTAC576                            AspLeuGlyPheAlaPheGluPheProGluProArgProGlyAspTyr                               180185190                                                                      CGGACGCAAGTGCTGCGCGATGTGATCGGGGATTTGCCGGAGCCGGTC624                            ArgThrGlnValLeuArgAspValIleGlyAspLeuProGluProVal                               195200205                                                                      GATACTTGCGGCCGACGCGTGAAGAAGGTGGAACGTGTCGCGGACATG672                            AspThrCysGlyArgArgValLysLysValGluArgValAlaAspMet                               210215220                                                                      AACGAGCCGGGGCCGACGGTGACGACACAGTTCCGTTGTCAGACGGTC720                            AsnGluProGlyProThrValThrThrGlnPheArgCysGlnThrVal                               225230235240                                                                   GAGATTACGAATCACAACGGGGGAGTCCCGGCGAAAGAATATCCCGGT768                            GluIleThrAsnHisAsnGlyGlyValProAlaLysGluTyrProGly                               245250255                                                                      CACACTGCGTCGAGTCTCGAAGGACCAGCGAAGAAAACGATTGTGGCG816                            HisThrAlaSerSerLeuGluGlyProAlaLysLysThrIleValAla                               260265270                                                                      GGCGCAAACGGCGTGCCCGGCGGCGCGAATTGTTTCTATCCGAACCAC864                            GlyAlaAsnGlyValProGlyGlyAlaAsnCysPheTyrProAsnHis                               275280285                                                                      GAACGCAAAGAAATCAGCGAAAAGGCGCTCGCAGGCTACGAACGGCGC912                            GluArgLysGluIleSerGluLysAlaLeuAlaGlyTyrGluArgArg                               290295300                                                                      GGAGGACAGGGTGGATTTGGATTCCGCGTGAACCAATGGGACGATCCG960                            GlyGlyGlnGlyGlyPheGlyPheArgValAsnGlnTrpAspAspPro                               305310315320                                                                   TCGCCTACGATATTCGGCCGGATTTTTAACGAAGGAAAGGCGTTCGT1008                            SerProThrIlePheGlyArgIlePheAsnGluGlyLysAlaPheVal                               325330335                                                                      CATCCGGGGCCTATCGAAAATCACGATGAAAAATCGTTCTGGACGCC1056                            HisProGlyProIleGluAsnHisAspGluLysSerPheTrpThrPro                               340345350                                                                      AAATCCGAATACACCTACGACCAAGCTAATCGTGTACAGTCGTGGGA1104                            LysSerGluTyrThrTyrAspGlnAlaAsnArgValGlnSerTrpAsp                               355360365                                                                      AAGCCAAGTGCCACAATCCCCGCGCATCACAACAGTGGACAGCCGAA1152                            LysProSerAlaThrIleProAlaHisHisAsnSerGlyGlnProAsn                               370375380                                                                      CATCCGCAGTACGCGAACCACGACCGGTACGCTGTCCTCGCGAAGGA1200                            HisProGlnTyrAlaAsnHisAspArgTyrAlaValLeuAlaLysAsp                               385390395400                                                                   AGCGACGTGATTCCGAAAATCCCCGAAGGAGCGTCGAATCGACAAGC1248                            SerAspValIleProLysIleProGluGlyAlaSerAsnArgGlnAla                               405410415                                                                      GCAAAGATAGAACCGGACATTTATTGGTCGGACTATATCCGGGAGAG1296                            AlaLysIleGluProAspIleTyrTrpSerAspTyrIleArgGluSer                               420425430                                                                      CGCGAAAACCCTGCACGCACAATGATCGGTACAGGTAAGCCGAAGAT1344                            ArgGluAsnProAlaArgThrMetIleGlyThrGlyLysProLysIle                               435440445                                                                      CACCCGACACAGCCCCGCCGTTTCACTGTCCGCGAGTGCCTGCGGAT1392                            HisProThrGlnProArgArgPheThrValArgGluCysLeuArgIle                               450455460                                                                      CAATCGGTGCCCGACTGGTACGTGCTGCCGGATGACATTTCGCTATC1440                            GlnSerValProAspTrpTyrValLeuProAspAspIleSerLeuSer                               465470475480                                                                   GCGCAATACCGTATCGTCGGCAACGGGATAGCGTCGCGCGTCGCGTA1488                            AlaGlnTyrArgIleValGlyAsnGlyIleAlaSerArgValAlaTyr                               485490495                                                                      TTGCTCGGAATTGCCCTGGCGGAACAACTCCGTGCCGCAACGGAATC1536                            LeuLeuGlyIleAlaLeuAlaGluGlnLeuArgAlaAlaThrGluSer                               500505510                                                                      AGCGCAATAGGCGAGCGTTTGATTGCGGATAATACGGACGACTGCGC1584                            SerAlaIleGlyGluArgLeuIleAlaAspAsnThrAspAspCysAla                               515520525                                                                      AACAGTCGGAAGGAGGCGGTCTAG1608                                                   AsnSerArgLysGluAlaVal                                                          530535                                                                         (2) INFORMATION FOR SEQ ID NO:3:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 1254 base pairs                                                    (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: Genomic DNA                                                (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE:                                                             (vi) ORIGINAL SOURCE:                                                          (ix) FEATURE:                                                                  (A) NAME/KEY: Coding Sequence                                                  (B) LOCATION: 1...1251                                                         (D) OTHER INFORMATION:                                                         (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3:                                        ATGGGAGAAAACCAAGAATCAATATGGGCAAATCAGATATTGGACAAG48                             MetGlyGluAsnGlnGluSerIleTrpAlaAsnGlnIleLeuAspLys                               151015                                                                         GCCCAACTGGTTATGCCAGAAACTCATGAACAAAATTTAGCTGATACT96                             AlaGlnLeuValMetProGluThrHisGluGlnAsnLeuAlaAspThr                               202530                                                                         TTGATTGACTTATGTTACAATGCAGCAAAGAGAACTAATGTTCCCGTC144                            LeuIleAspLeuCysTyrAsnAlaAlaLysArgThrAsnValProVal                               354045                                                                         GGAATTGCTCTGGCTGCATCATTCGATTTATTGGTAGGAGCCGAGTAT192                            GlyIleAlaLeuAlaAlaSerPheAspLeuLeuValGlyAlaGluTyr                               505560                                                                         TATAGAAACGTAATCAACAGAGGGTGGTGTTACTGTCCAGAACACCAA240                            TyrArgAsnValIleAsnArgGlyTrpCysTyrCysProGluHisGln                               65707580                                                                       TCTTTAATTTTCCCTTATACAAATACTTGCCCTGCATGTGTACTTTCG288                            SerLeuIlePheProTyrThrAsnThrCysProAlaCysValLeuSer                               859095                                                                         GGAAAATTTCATTTTCATCGTTCTAATAAACCGGAATCGGGGAAAATC336                            GlyLysPheHisPheHisArgSerAsnLysProGluSerGlyLysIle                               100105110                                                                      GGTACGGCAACTTCCCGCTTGCTTTGCGTATTTCTGGACAGGCTTTTT384                            GlyThrAlaThrSerArgLeuLeuCysValPheLeuAspArgLeuPhe                               115120125                                                                      GTAAAATCATCAAGGAACTTTAAGATTTTCAAAGGCAGTGAACCTATT432                            ValLysSerSerArgAsnPheLysIlePheLysGlySerGluProIle                               130135140                                                                      GATATTTTAATACACGATGAGCAGAAAAACATAATGCTATTGGCTGAA480                            AspIleLeuIleHisAspGluGlnLysAsnIleMetLeuLeuAlaGlu                               145150155160                                                                   GTTAAAGCTGCTCCGCTAATTACCTTACCATTATTGGTTCGATCTGAA528                            ValLysAlaAlaProLeuIleThrLeuProLeuLeuValArgSerGlu                               165170175                                                                      GAAAAAATTACCGATTTAGTCGATGGTGAAATAGTTGAAATACCACAT576                            GluLysIleThrAspLeuValAspGlyGluIleValGluIleProHis                               180185190                                                                      TCTGCCGTAGACAACTCATCTTTATCTTCATCAAATATTTGCTTGCTT624                            SerAlaValAspAsnSerSerLeuSerSerSerAsnIleCysLeuLeu                               195200205                                                                      CTCCCTGTTTTTCATGATGGGAGTTGGCAAAGTAAATTTGTTGAACTT672                            LeuProValPheHisAspGlySerTrpGlnSerLysPheValGluLeu                               210215220                                                                      CAAACGAAAGATGATATATTAACAAATACCATTTGGGCATACGGCCAG720                            GlnThrLysAspAspIleLeuThrAsnThrIleTrpAlaTyrGlyGln                               225230235240                                                                   CTAGAAAATATTTTTAGGGGAAATAATGATCTTTTTGATTTATACTTA768                            LeuGluAsnIlePheArgGlyAsnAsnAspLeuPheAspLeuTyrLeu                               245250255                                                                      GATACTTGGAAAAGGGCATTCGAAGCATATCAGGTGGCTTATCACGAA816                            AspThrTrpLysArgAlaPheGluAlaTyrGlnValAlaTyrHisGlu                               260265270                                                                      AAAGATAGATCAAGCAACATTTTTTGGTTGACAAATGCTTGTGGACAA864                            LysAspArgSerSerAsnIlePheTrpLeuThrAsnAlaCysGlyGln                               275280285                                                                      CCTGAGCCAAGACCTGTCGATTGGCCCGCTAGATCGGGAACAGGTTAT912                            ProGluProArgProValAspTrpProAlaArgSerGlyThrGlyTyr                               290295300                                                                      GAATCTGTTTCTGATGGAAAAACTAGTGTGGGCATGGATAGAACTGAC960                            GluSerValSerAspGlyLysThrSerValGlyMetAspArgThrAsp                               305310315320                                                                   GATATTAAAAAAGGAATTTATCAAGTGTTAAAGCTGGGCGCTGAGAG1008                            AspIleLysLysGlyIleTyrGlnValLeuLysLeuGlyAlaGluSer                               325330335                                                                      AAGCCCATAAACCAGCAGTATCAAATTAAAACAGCACTAATATCAAA1056                            LysProIleAsnGlnGlnTyrGlnIleLysThrAlaLeuIleSerAsn                               340345350                                                                      ATTCATGCCGCGAGACACTACGACGAATATCTAACCTCATTGCAAGA1104                            IleHisAlaAlaArgHisTyrAspGluTyrLeuThrSerLeuGlnAsp                               355360365                                                                      GTAGTTTGGGCATTAGATGAAACAGGATTAGCTAAGAAAGCAGGTGA1152                            ValValTrpAlaLeuAspGluThrGlyLeuAlaLysLysAlaGlyGlu                               370375380                                                                      CTTGACAGTGAGACACCAATCTACAACTTATTTGACGGAATCATTTC1200                            LeuAspSerGluThrProIleTyrAsnLeuPheAspGlyIleIleSer                               385390395400                                                                   TTTACGCGAAATCACCCTAGAGATGAATGGATTAGAGAAAACTTCCA1248                            PheThrArgAsnHisProArgAspGluTrpIleArgGluAsnPheGln                               405410415                                                                      TTCTGA1254                                                                     Phe                                                                            (2) INFORMATION FOR SEQ ID NO:4:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 1128 base pairs                                                    (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: Genomic DNA                                                (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE:                                                             (vi) ORIGINAL SOURCE:                                                          (ix) FEATURE:                                                                  (A) NAME/KEY: Coding Sequence                                                  (B) LOCATION: 1...1125                                                         (D) OTHER INFORMATION:                                                         (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4:                                        ATGAATGGATTAGAGAAAACTTCCAATTCTGACGAATCTTTCGACTTT48                             MetAsnGlyLeuGluLysThrSerAsnSerAspGluSerPheAspPhe                               151015                                                                         GTTTTTAATCATGTCTTTCGTCGCTCAGGTTCTGAAGTACAAAAAAGA96                             ValPheAsnHisValPheArgArgSerGlySerGluValGlnLysArg                               202530                                                                         ATTGATGCACTAAAAATCGGTCAAAAAATGCAAGACCTGCCTGAAGAA144                            IleAspAlaLeuLysIleGlyGlnLysMetGlnAspLeuProGluGlu                               354045                                                                         CTATGGCATGACAGCTTTCGTTATTATGTAAAGGAAGACCCAAATAGA192                            LeuTrpHisAspSerPheArgTyrTyrValLysGluAspProAsnArg                               505560                                                                         CGCGGCGGCCCAAACATGCGAATGATACGCCTTGATCCGTCTAAACCT240                            ArgGlyGlyProAsnMetArgMetIleArgLeuAspProSerLysPro                               65707580                                                                       TCTCTAACGGTGACAGGATATATTTTCAATAAGTTTGTGCATCCTTAT288                            SerLeuThrValThrGlyTyrIlePheAsnLysPheValHisProTyr                               859095                                                                         GAGAACCGTTTTATTACCGTGCGTGAAGCTGCAAGACTGCAAGGCTTC336                            GluAsnArgPheIleThrValArgGluAlaAlaArgLeuGlnGlyPhe                               100105110                                                                      CCAGATTCATTAAAATTTGAAGGCTCATTAACTAGTACGCAGATGCAA384                            ProAspSerLeuLysPheGluGlySerLeuThrSerThrGlnMetGln                               115120125                                                                      GTCGGCAATGCCGTGCCAGTACAATTAGCTAAAGCAGTTTTTGAAGCG432                            ValGlyAsnAlaValProValGlnLeuAlaLysAlaValPheGluAla                               130135140                                                                      GTACTAATTTCTGTTAGAAAATTAGGATATGGCAAAAGAAATTTAACT480                            ValLeuIleSerValArgLysLeuGlyTyrGlyLysArgAsnLeuThr                               145150155160                                                                   GCGTTTAGTCTTTTTAGCGGTGCTGGTGGACTTGATATTGGTGCTGAA528                            AlaPheSerLeuPheSerGlyAlaGlyGlyLeuAspIleGlyAlaGlu                               165170175                                                                      CAAGCTACATATAAATCAATGAAAATAGAAACCTTGGTGACATTAGAT576                            GlnAlaThrTyrLysSerMetLysIleGluThrLeuValThrLeuAsp                               180185190                                                                      AATTGGAAAGACGCTTGTGATACCCTTAGAGGATTCTATCAAGGACGT624                            AsnTrpLysAspAlaCysAspThrLeuArgGlyPheTyrGlnGlyArg                               195200205                                                                      ACAAGTGTTTTGCAAGGAGATATTTCAGAGATACAAGACCCCAAATTA672                            ThrSerValLeuGlnGlyAspIleSerGluIleGlnAspProLysLeu                               210215220                                                                      TTATGGCATCAAGAATCACAACACGATCAGATTCCTGATATTGTATTT720                            LeuTrpHisGlnGluSerGlnHisAspGlnIleProAspIleValPhe                               225230235240                                                                   GGGGGGCCTCCCTGCCAGGCGTTCAGTCAAGCTGGTAAACAAAAGGCA768                            GlyGlyProProCysGlnAlaPheSerGlnAlaGlyLysGlnLysAla                               245250255                                                                      ACAAATGACCCGAGAGGAAACTTGATTTACGAGTACCTCAGATTTATT816                            ThrAsnAspProArgGlyAsnLeuIleTyrGluTyrLeuArgPheIle                               260265270                                                                      GAGAAAATCAACCCTCCATTCTTTGTAATGGAAAATGTAGCGAACTTG864                            GluLysIleAsnProProPhePheValMetGluAsnValAlaAsnLeu                               275280285                                                                      AAAGGTGTTCAGCGCGGGGAACTTTATCAAGACATTTTGGAGCGCATG912                            LysGlyValGlnArgGlyGluLeuTyrGlnAspIleLeuGluArgMet                               290295300                                                                      TCTAATCTTGGTTATAATGTGACGGTTGCCCCGCTTCTTGCGGCGGAT960                            SerAsnLeuGlyTyrAsnValThrValAlaProLeuLeuAlaAlaAsp                               305310315320                                                                   TATGGTGCACCACAGCTTAGAAAACGTCTAATATTCTTAGGCTGTAA1008                            TyrGlyAlaProGlnLeuArgLysArgLeuIlePheLeuGlyCysLys                               325330335                                                                      AAGGAATTCGGGGTGATGGAACTCCCAGTTCCGACCCATAGTAATAC1056                            LysGluPheGlyValMetGluLeuProValProThrHisSerAsnThr                               340345350                                                                      CCCGATTTATTATCACCAAACCCTTATGTAACAGTGGGGGAAGCCTT1104                            ProAspLeuLeuSerProAsnProTyrValThrValGlyGluAlaPhe                               355360365                                                                      AAAGGTTTACCTAAACTTGTTTAA1128                                                   LysGlyLeuProLysLeuVal                                                          370375                                                                         (2) INFORMATION FOR SEQ ID NO:5:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 6 amino acids                                                      (B) TYPE: amino acid                                                           (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: peptide                                                    (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE: N-terminal                                                  (vi) ORIGINAL SOURCE:                                                          (xi) SEQUENCE DESCRIPTION: SEQ ID NO:5:                                        MetGlyGluAsnGlnGlu                                                             15                                                                             (2) INFORMATION FOR SEQ ID NO:6:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 6 amino acids                                                      (B) TYPE: amino acid                                                           (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: peptide                                                    (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE: N-terminal                                                  (vi) ORIGINAL SOURCE:                                                          (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6:                                        AspLysAlaGlnLeuVal                                                             15                                                                             (2) INFORMATION FOR SEQ ID NO:7:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 16 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: Synthetic DNA                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE:                                                             (vi) ORIGINAL SOURCE:                                                          (ix) FEATURE:                                                                  (A) NAME/KEY:                                                                  (B) LOCATION: 5                                                                (D) OTHER INFORMATION: N =A, C, G or T                                         (A) NAME/KEY:                                                                  (B) LOCATION: 8 and 14                                                         (D) OTHER INFORMATION: R =A or G                                               (A) NAME/KEY:                                                                  (B) LOCATION: 11                                                               (D) OTHER INFORMATION: Y =C or T                                               (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7:                                        ATGGNGARAAYCARGA16                                                             (2) INFORMATION FOR SEQ ID NO:8:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 17 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: Synthetic DNA                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE:                                                             (vi) ORIGINAL SOURCE:                                                          (ix) FEATURE:                                                                  (A) NAME/KEY:                                                                  (B) LOCATION: 9                                                                (D) OTHER INFORMATION: N =A, C, G or T                                         (A) NAME/KEY:                                                                  (B) LOCATION: 15                                                               (D) OTHER INFORMATION: R =A or G                                               (A) NAME/KEY:                                                                  (B) LOCATION: 3, 6, and 12                                                     (D) OTHER INFORMATION: Y =C or T                                               (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8:                                        ACYAAYTGNGCYTTRTC17                                                            (2) INFORMATION FOR SEQ ID NO:9:                                               (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 17 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: Synthetic DNA                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE:                                                             (vi) ORIGINAL SOURCE:                                                          (ix) FEATURE:                                                                  (A) NAME/KEY:                                                                  (B) LOCATION: 3 and 9                                                          (D) OTHER INFORMATION: N =A, C, G or T                                         (A) NAME/KEY:                                                                  (B) LOCATION: 15                                                               (D) OTHER INFORMATION: R =A or G                                               (A) NAME/KEY:                                                                  (B) LOCATION: 6 and 12                                                         (D) OTHER INFORMATION: Y =C or T                                               (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9:                                        ACNAGYTGNGCYTTRTC17                                                            (2) INFORMATION FOR SEQ ID NO:10:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 59 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: Genomic DNA                                                (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE:                                                             (vi) ORIGINAL SOURCE:                                                          (xi) SEQUENCE DESCRIPTION: SEQ ID NO:10:                                       ATGGGAGAAAACCAAGAATCAATATGGGCAAATCAGATATTGGACAAGGCCCAACTGGT59                  (2) INFORMATION FOR SEQ ID NO:11:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 20 amino acids                                                     (B) TYPE: amino acid                                                           (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: peptide                                                    (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE: N-terminal                                                  (vi) ORIGINAL SOURCE:                                                          (xi) SEQUENCE DESCRIPTION: SEQ ID NO:11:                                       MetGlyGluAsnGlnGluSerIleTrpAlaAsnGlnIleLeuAspLys                               151015                                                                         AlaGlnLeuVal                                                                   20                                                                             (2) INFORMATION FOR SEQ ID NO:12:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 24 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: Synthetic DNA                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE:                                                             (vi) ORIGINAL SOURCE:                                                          (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12:                                       GATATTGGACAAGGCCCAACTGGT24                                                     (2) INFORMATION FOR SEQ ID NO:13:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 24 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: Synthetic DNA                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE:                                                             (vi) ORIGINAL SOURCE:                                                          (xi) SEQUENCE DESCRIPTION: SEQ ID NO:13:                                       TATTGATTCTTGGTTTTCTCCCAT24                                                     (2) INFORMATION FOR SEQ ID NO:14:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 27 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: Synthetic DNA                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE:                                                             (vi) ORIGINAL SOURCE:                                                          (xi) SEQUENCE DESCRIPTION: SEQ ID NO:14:                                       TCCGCTAATTACCTTACCATTATTGGT27                                                  (2) INFORMATION FOR SEQ ID NO:15:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 27 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: Synthetic DNA                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE:                                                             (vi) ORIGINAL SOURCE:                                                          (xi) SEQUENCE DESCRIPTION: SEQ ID NO:15:                                       CTTTAACTTCAGCCAATAGCATTATGT27                                                  (2) INFORMATION FOR SEQ ID NO:16:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 51 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: Synthetic DNA                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE:                                                             (vi) ORIGINAL SOURCE:                                                          (xi) SEQUENCE DESCRIPTION: SEQ ID NO:16:                                       AGCTCTAGAGGAGGTAAATAAATGGGAGAAAACCAAGAATCAATATGGGCA51                          (2) INFORMATION FOR SEQ ID NO:17:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 42 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: Synthetic DNA                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE:                                                             (vi) ORIGINAL SOURCE:                                                          (xi) SEQUENCE DESCRIPTION: SEQ ID NO:17:                                       CGCGGATCCTCAGAATTGGAAGTTTTCTCTAATCCATTCATC42                                   (2) INFORMATION FOR SEQ ID NO:18:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 27 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: Synthetic DNA                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE:                                                             (vi) ORIGINAL SOURCE:                                                          (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18:                                       TCTTTCGTCGCTCAGGTTCTGAAGTAC27                                                  (2) INFORMATION FOR SEQ ID NO:19:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 27 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: Synthetic DNA                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE:                                                             (vi) ORIGINAL SOURCE:                                                          (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19:                                       TGATTAAAAACAAAGTCGAAAGATTCG27                                                  (2) INFORMATION FOR SEQ ID NO:20:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 27 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: Synthetic DNA                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE:                                                             (vi) ORIGINAL SOURCE:                                                          (xi) SEQUENCE DESCRIPTION: SEQ ID NO:20:                                       GGAAAATGTAGCGAACTTGAAAGGTGT27                                                  (2) INFORMATION FOR SEQ ID NO:21:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 27 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: Synthetic DNA                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE:                                                             (vi) ORIGINAL SOURCE:                                                          (xi) SEQUENCE DESCRIPTION: SEQ ID NO:21:                                       ACAAAGAATGGAGGGTTGATTTTCTCA27                                                  (2) INFORMATION FOR SEQ ID NO:22:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 49 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: Synthetic DNA                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE:                                                             (vi) ORIGINAL SOURCE:                                                          (xi) SEQUENCE DESCRIPTION: SEQ ID NO:22:                                       CAAGGATCCGGAGGTTAATTAAATGAATGGATTAGAGAAAACTTCCAAT49                            (2) INFORMATION FOR SEQ ID NO:23:                                              (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH: 39 base pairs                                                      (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                           (ii) MOLECULE TYPE: Synthetic DNA                                              (iii) HYPOTHETICAL: NO                                                         (iv) ANTI-SENSE: NO                                                            (v) FRAGMENT TYPE:                                                             (vi) ORIGINAL SOURCE:                                                          (xi) SEQUENCE DESCRIPTION: SEQ ID NO:23:                                       TTCGGATCCTTAAACAAGTTTAGGTAAACCTTTGAAGGC39                                      __________________________________________________________________________ 

What is claimed is:
 1. Isolated DNA coding for the BssHII restriction endonuclease, wherein the isolated DNA is obtainable from B. stearothermophilus.
 2. A recombinant DNA vector comprising a vector into which a DNA segment coding for the BssHII restriction endonuclease has been inserted.
 3. Isolated DNA coding for the BssHII restriction endonuclease and methylase, wherein the isolated DNA is obtainable from ATCC No.
 98334. 4. A cloning vector which comprises the isolated DNA of claim
 3. 5. A host cell transformed by the vector of claims 2 or
 4. 6. A method of producing a BssHII restriction endonuclease comprising culturing a host cell transformed with the vector of claims 2 or 4 under conditions suitable for expression of said endonuclease. 